Seeing Stars When There Aren’t Many Stars: Graph-Based Semi-Supervised Learning For Sentiment Categorization
نویسندگان
چکیده
We present a graph-based semi-supervised learning algorithm to address the sentiment analysis task of rating inference. Given a set of documents (e.g., movie reviews) and accompanying ratings (e.g., “4 stars”), the task calls for inferring numerical ratings for unlabeled documents based on the perceived sentiment expressed by their text. In particular, we are interested in the situation where labeled data is scarce. We place this task in the semi-supervised setting and demonstrate that considering unlabeled reviews in the learning process can improve ratinginference performance. We do so by creating a graph on both labeled and unlabeled data to encode certain assumptions for this task. We then solve an optimization problem to obtain a smooth rating function over the whole graph. When only limited labeled data is available, this method achieves significantly better predictive accuracy over other methods that ignore the unlabeled examples during training.
منابع مشابه
Semi-supervised Probabilistic Sentiment Analysis: Merging Labeled Sentences with Unlabeled Reviews to Identify Sentiment
Document level sentiment analysis, the task of determining whether the sentiment expressed in a document is positive or negative, is commonly performed by supervised methods. As with all supervised tasks, obtaining training data for these methods can be expensive and timeconsuming. Some semi-supervised approaches have been proposed that rely on sentiment lexicons. We propose a novel supervised ...
متن کاملSeeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales
We address the rating-inference problem, wherein rather than simply decide whether a review is “thumbs up” or “thumbs down”, as in previous sentiment analysis work, one must determine an author’s evaluation with respect to a multi-point scale (e.g., one to five “stars”). This task represents an interesting twist on standard multi-class text categorization because there are several different deg...
متن کاملSentiment Classification in Under-Resourced Languages Using Graph-Based Semi-Supervised Learning Methods
In sentiment classification, conventional supervised approaches heavily rely on a large amount of linguistic resources, which are costly to obtain for under-resourced languages. To overcome this scarce resource problem, there exist several methods that exploit graph-based semisupervised learning (SSL). However, fundamental issues such as controlling label propagation, choosing the initial seeds...
متن کاملSemi-supervised vs. Cross-domain Graphs for Sentiment Analysis
The lack of labeled data always poses challenges for tasks where machine learning is involved. Semi-supervised and cross-domain approaches represent the most common ways to overcome this difficulty. Graph-based algorithms have been widely studied during the last decade and have proved to be very effective at solving the data limitation problem. This paper explores one of the most popular stateo...
متن کاملGraph-based approaches for semi-supervised and cross-domain sentiment analysis
The rapid development of Internet technologies has resulted in a sharp increase in the number of Internet users who create content online. Usergenerated content often represents people’s opinions, thoughts, speculations and sentiments and is a valuable source of information for companies, organisations and individual users. This has led to the emergence of the field of sentiment analysis, which...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006